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Abstract 

The prospect of neural reconstruction from Electron Mi¬ 
croscopy (EM) images has been elucidated by the auto¬ 
matic segmentation algorithms. Although segmentation al¬ 
gorithms eliminate the necessity of tracing the neurons by 
hand, significant manual effort is still essential for correct¬ 
ing the mistakes they make. A considerable amount of hu¬ 
man labor is also required for annotating groundtruth vol¬ 
umes for training the classifiers of a segmentation frame¬ 
work. It is critically important to diminish the dependence 
on human interaction in the overall reconstruction system. 
This study proposes a novel classifier training algorithm for 
EM segmentation aimed to reduce the amount of manual ef¬ 
fort demanded by the groundtruth annotation and error re¬ 
finement tasks. Instead of using an exhaustive pixel level 
groundtruth, an active learning algorithm is proposed for 
sparse labeling of pixel and boundaries of superpixels. Be¬ 
cause over-segmentation errors are in general more toler¬ 
able and easier to correct than the under-segmentation er¬ 
rors, our algorithm is designed to prioritize minimization 
of false-merges over false-split mistakes. Our experiments 
on both 2D and 3D data suggest that the proposed method 
yields segmentation outputs that are more amenable to neu¬ 
ral reconstruction than those of existing methods. 


1. Introduction 

One important task for neural reconstruction from Elec¬ 
tron Microscopy (EM) is to extract the anatomical struc¬ 
ture of a neuron by accurately assigning regions of EM im¬ 
ages to corresponding cells. Due to the size and number of 
EM images typically required for a useful dense reconstruc¬ 
tion, it is impractical to manually perform such task. Recent 
studies on neural reconstructions or connectomics [32] [15] 
apply automated segmentation algorithms for determining 
cell morphology. The result of such an automated segmen¬ 
tation algorithm is not free of errors, which is why a re¬ 


construction approach must either manually correct the mis¬ 
takes made by these algorithms [32], or conform them to a 
skeleton representation generated earlier by hand [15]. 

In addition, there have been many notable works ad¬ 
dressing one or multiple processes constituting an over¬ 
all segmentation algorithm. Existing algorithms such 
as[17][9][22] for pixel classification; [25][24] for effective 
generation of over-segmentation; [30] [19] [3] for isotropic 
3D supervoxel clustering; [33] [11] for co-segmentation for 
anisotropic data report impressive performances on differ¬ 
ent kinds of EM datasets. Many of these novel approaches 
are motivated by the methods in natural image segmentation 
and evaluate output accuracy using error measures popular 
in computer vision literature, e.g., Rand Error (RE) of [19], 
Variance of Information (VI) of [3] [30]. 

Ideally, an automated segmentation should attain 100% 
accuracy - its output should be free of both types of seg¬ 
mentation errors, namely false merge (under-segmentation) 
and false split (over-segmentation). However, it is not real¬ 
istic to expect (near) 100% accuracy in practice; given the 
performances of the existing state of the art algorithms, one 
can generally assume that their outputs need to be corrected 
afterwards. Then, from a connectomics point of view, a seg¬ 
mentation algorithm should be designed to minimize man¬ 
ual labor (or algorithmic complexity) required for correct¬ 
ing its output[18]. 

To the best of our knowledge, there has not yet been a 
study analyzing the effect of segmentation errors on the ef¬ 
fort necessary to correct them. Although error quantities, 
such as Rand Error (RE) [19], provide a coarse assessment 
of the mistakes an algorithm makes, they are unable to con¬ 
clusively forecast the amount of work required for refine¬ 
ment. As an example, inaccurately combining two regions 
of sizes A and B would incur the same RE value as incor¬ 
rectly splitting one region of size A-i-B into two parts. How¬ 
ever, rectifying these two mistakes demands significantly 
different amount of work [7] . The high RE of a false split 
of two large bodies, e.g., A = B = 10000, on a 512 x 512 
image disproportionately penalizes the effort to correct such 



(a) Plane 50 (b) Plane 220 (c) Plane 484 

Figure 1 . 3D Segmentation output on 3 planes from a volume of 500 images. Each 
individual neuron has been colorized with a different color. The two adjacent regions, 
colored in white, in the top-right, are in fact parts of two different neurons which 
have been falsely merged. Manually correcting this under-segmentation error is much 
more labor-intensive than correcting a false split. 


error. 

From a reconstruction perspective, an over-segmented 
result is preferred over an under-segmented one because 
a fragmented set of regions can be refined by auto¬ 
mated methods such as agglomeration [28] [29] [30] or co¬ 
segmentation [11], but an under-segmented region can only 
be fixed by a human expert. Even for a human expert, 
identifying and correcting false merges is more difficult 
than correcting false split [7]. This difficulty is more pro¬ 
nounced in 3D volume segmentation than it is in 2D seg¬ 
mentation. Consider separating the two regions falsely con¬ 
nected through 450 planes (from 50 to 500) of a 520^ vol¬ 
ume by a segmentation method as displayed in Figure 1. 
The authors of [28] [29] [30] were aware of this issue and 
reported the two types of error rates separately for perfor¬ 
mance assessment. The study of [17] attempts to reduce 
false merges by identifying the locations vital for preserv¬ 
ing topology given exhaustive groundtruth of the data. 

Another desirable property of the EM segmentation al¬ 
gorithms is to be able to train the necessary components effi¬ 
ciently without compromising accuracy. An efficient train¬ 
ing is perhaps essential for large scale reconstruction where 
one may anticipate learning the predictors multiple times 
for different neuropils. A quick segmentation result may 
also assist the neurobiologist to decide the optimal sample 
preparation that would maximize segmentation accuracy. 
But, training existing segmentation algorithms [17] [9] [22] 
remains a significant bottleneck in connectomics [14] due to 
the time and effort necessary for generating the groundtruth 
and time complexity of training the classifier (e.g., deep 
neural networks). 

A highly curated exhaustive groundtruth, such as those 
offered by the segmentation challenges (e.g., ISBI 2012 
2D, SNEMI 2013 3D), demands extensive effort. Provided 
necessary resources, it is possible to generate a reasonable 
groundtruth by iteratively refining segmentation on a small 
volume with an interactive labeling tool such as ilastik [31]. 
This label set is expected to contain a small degree of tol¬ 
erable noise but is efficient to generate. Some recent algo¬ 
rithms [3] [28] [29] [30] have utilized interactively generated 
groundtruth to train the necessary tools for segmentation. 
However, these algorithms inherently rely on highly expert 



(a) (b) (c) (d) 

Figure 2. Workflow of a standard EM segmentation framework: (a) input 
(b) pixelwise classification (white: membrane, black: non-membrane) (c) over¬ 
segmentation (d) final segmentation. 


annotators or neurobiologists in order to produce a useful 
annotation efficiently (by finding out the minimal area to 
label for the prediction-correction scheme). Automated al¬ 
gorithms are expected to diminish such dependency on hu¬ 
man expertise. As an alternative to exhaustive labeling, 
Jones et.al. [21] presented a method for sparsely labeling 
the membrane locations based on appearance similarity to 
user annotated examples. A completely semisupervised ap¬ 
proach like [21] will be sensitive to the penalty parameter 
and has a risk of introducing noises that are too difficult for 
a classifier to tolerate. 

We adopt a standard EM segmentation ap¬ 
proach [19][3][28][30], as illustrated in Figure 2, where 
the confidence values of a pixelwise classifier ^ are utilized 
to generate an initial over-segmentation of the dataset. 
The over-segmentated image or volume is then refined 
by aggregating superpixels with the help of a superpixel 
boundary classifier. In this paper, we propose an algorithm 
for training pixel and superpixel boundary classifiers. The 
classifiers are trained to attain two desirable properties of 
an EM segmentation method: 

1. Maximize efficiency: the proposed algorithm employs 
active learning for classification. Instead of requiring an 
exhaustive pixel-level groundtruth, our algorithm automat¬ 
ically determines a small fraction of samples that are crit¬ 
ical for training the pixel and superpixel boundary classi¬ 
fiers (< 1% for pixel and < 20% for superpixel bound¬ 
ary). These examples are identified using the disagreement 
between two predictors: a) a classifier being updated it¬ 
eratively, and b) a semisupervised label propagation algo¬ 
rithm [6] predicting labels based on feature similarity. Un¬ 
like [21], all our training examples are labeled by an anno¬ 
tator. 

2. Minimize false-merge: without exhaustive groundtruth, 
it is not possible to locate the topologically critical pixels 
using the method of [17]. We hypothesize that emphasizing 
on the detection of membrane pixels over other types would 
reduce the amount of false merges. Accordingly, our train¬ 
ing protocol is designed to be biased towards more accurate 
learning of membrane class than the remaining categories. 

We empirically demonstrate the advantages of the pro¬ 
posed method over the state of the art techniques for neural 
reconstruction from both 2D and 3D EM data. The over¬ 
awe adopt multi-class pixel classification, as is explained later. 




all active learning algorithm is defined in Section 2. Sec¬ 
tions 2.2 and 2.3 explain how our active training approach 
is adapted for pixel and superpixel boundary classification. 
The following section (Section 3) discusses the experimen¬ 
tal setup and reports the results. Finally, Section 4 con¬ 
cludes with a discussion summarizing our findings. 

2. Proposed Active Labeling Framework 

The segmentation scheme we adopt consists of pixel 
classification followed by a superpixel clustering by means 
of a superpixel boundary classifier. We propose an active 
strategy to train both the pixel and superpixel boundary 
classifiers. The goal of an active learning method is to iden¬ 
tify a few examples - crucial for training a classifier - from 
a pool of unlabeled samples. The proposed active classifi¬ 
cation scheme identifies the challenging examples from the 
dataset and requests their labels from user. Given the la¬ 
bels for the query examples, the algorithm reconfigures its 
predictors and identifies a new set of queries in a repetitive 
fashion. 

With the aim of locating these challenging examples, we 
estimate the class label of any unlabeled point by two pre¬ 
dictors having substantially different views of the dataset. 
One predictor is a classifier (Random Forest (RF) [5] in 
our experiments) trained from an initially available subset 
of datapoints Xi (Z X = {xi,... and their labels 
Yi . The other predictor is a novel variant of semisupervised 
label propagation algorithm [36] [6], that assumes a cluster 
formation of similar datapoints in feature space. While the 
classifier assesses the class of an unlabeled example by a 
discriminative set of rules learned so far, the label propa¬ 
gation technique extrapolates a prediction based on feature 
similarity among the datapoints. 

A training sample is considered to be challenging if 
the class suggested by feature similarity is different from 
that calculated by the discriminative rules and vice versa. 
For the interested readers, we illustrate the intuition behind 
our query generation approach on the synthetic two moon 
dataset on Figure 3. Provided the same set of labeled ex¬ 
amples, circled in black in Figure 3(b) and Figure 3(c), the 
label propagation can correctly extrapolate the labels of the 
rest of the datapoints utilized feature similarity (here eu¬ 
clidean distance between points) whereas a classifier, such 
as RF, will be unable to infer the class separation. Our 
method would select some samples, where the two predic¬ 
tions differ (marked by blue diamonds), as the next set of 
queries. 

The disagreement among these two types of estimates is 
quantified by a ranking formula. The first few examples in 
descending order of disagreement measure are presented to 
the user as queries. The set Xi is augmented by this new 
annotated queries and the whole process is repeated until a 
predefined stopping criterion is satisfied. 



(a) Actual class labels (b) Label Prop output (c) RF prediction 

Figure 3. (a) synthetic two-moon dataset with + and . representing two classes; (b), 
(c) predictions from label propagation and random forest classifier respectively given 
labeled examples circled in black. Blue diamonds mark new queries. 


In Section 2.1, we propose the semisupervised label 
propagation method for a multiclass setting to facilitate the 
multiclass approaches of [29] [28]. The strategies for query 
generation and initialization are different for pixel and su¬ 
perpixel boundary classification and are explained in Sec¬ 
tions 2.2 and 2.3 respectively. 


2.1. Proposed Multiclass Label Propagation 


Let us suppose, we have n datapoints Xi that we wish to 
classify into one of the k classes. Let f^ denote the indica¬ 
tor variable for datapoint Xi\ /f = 1 if Xi is classified to 
class c and rest of its values are 0. We wish to assign ‘simi¬ 
lar’ datapoints into the same class, i.e., the pairs of samples 
Xi and Xj with large feature similarity quantified by Wij 
should belong to the same class. We propose to attain this 
by minimizing the following cost. 
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In this cost function, we normalize the weight wij by the 
corresponding degree di = Wij to balance the effects 
of disparity in class sample size. The cost is summed over 
all neighboring i ^ j that possess a feature similarity above 
a certain predefined value. Using a matrix notation for the 
indicator variables, F = [f^,... fj]^, we can write this 
cost function as 

J{F) = 2 Tr{FF^(/ - (3) 

where / and D are the identity and diagonal degree matri¬ 
ces respectively. By relaxing the values of F to be nonneg¬ 
ative real-valued numbers /f > 0 and differentiating wrt F, 
one can compute the system of linear equations needed to 
be solved for determining F. Of course, the minimization 
is constrained by label consistency among the values of f^, 
i.e., FI = 1, where 1 is a vector of all I’s. 

^ =0 ^ (^I - = 0 (4) 


An efficient solver for Equation 4 is essential to build an 
interactive interface of our method. By avoiding the fac¬ 
torization of matrices with thousands of variables, iterative 
techniques can produce a solution significantly faster than 
the closed form methods with the same level of accuracy ( 




















Table 1. Multiclass Label Propagation algorithm 
Algorithm: Multiclass Label Propagation 
repeat 

1. Set the known labels in F. 

2. Update solution by Equation 5. 

3. Project onto FI = 1 
until convergence 


up to a certain error tolerance). A stationary iterative formu¬ 
lation of this equation would repeatedly update the solution 
using the following formula [23]. 

Fnext = D-°-^WD-°-^F (5) 

This iteration will converge if: 1) the absolute value of 
the eigenvalues of is bounded by 1, and 2) 

I — is non-singular [23]. Since there is no 

bipartite connected component in the graph corresponding 
to W, the first condition is satisfied [8]. We add a small 
perturbation to the quantity to attain non¬ 

singularity. One must also satisfy the label consistency con¬ 
straint FI = 1 to reach a meaningful solution. 

In our active learning setting, the algorithm is given the 
labels for m out of n examples (where m < < n) at the be¬ 
ginning of the process. We set the known labels in F and 
iterate Equation 5 followed by a projection onto FI = 1 
until convergence for computing the unknown label confi¬ 
dences. The algorithm is outlined in Table 1 and has sim¬ 
ilarity to a past approach for efficient label propagation on 
large dataset [35]. 

After a query set is annotated by user, the linear equa¬ 
tions in 4 need to be solved again. Instead of starting the 
solver algorithm (Table 1) from scratch, we begin with the 
most recently converged F as the initial solution. Such a 
warm start brought about a significant speed-up without al¬ 
tering the output in our experiments. 

2.2. Active Learning for Pixel Classification 

In pixel classification, each datapoint Xi of the above for¬ 
mulation corresponds to a pixel. We will denote a pixel by 
a different literal Ui to distinguish it from it from superpixel 
boundary defined later. In our design, each pixel is clas¬ 
sified into one of the four classes: membrane, cytoplasm, 
mitochondria, mitochondria border [29] . 

Initial Subset Selection : Equal size subsets of samples, 
one for each class, are selected from the dataset to constitute 
the initial dataset Xi for label propagation. In the interactive 
setting, the user will be required to select the initial Xi using 
a GUI. 

In an attempt to maximize the detection of membrane 
pixels, the initial training set for the RE classifier is con¬ 
structed from a subset of Xi that contains different number 
of examples for different classes. In the following text, we 
describe how the pairwise similarity values in W are uti¬ 
lized to determine the sample proportion for different cate¬ 
gories. 


Introducing indicator vectors and for membrane 
class m and other classes o respectively, one can determine 
the sample proportion by solving an optimization problem. 
The value of aY^ = 1 if the i-th membrane example in Xi is 
selected and = 0 otherwise. The following formulation 
will select of largest subset of initial samples that will pre¬ 
vent misclassification of any member of class m in a nearest 
neighbor classifier setting. 
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Here, i/i G {m, o} indicates whether Ui belongs to mem¬ 
brane or other categories. In practice, we compute a sub- 
optimal solution to this problem for efficiency. In our so¬ 
lution, aY^ = 1 for all i. We then greedily select ex¬ 
amples for each class o to increase Cut(m, o) as long as 
Vol(o) < Vol(m); we refer the reader to [27] for the def¬ 
initions of these terms and to comprehend the motivation 
behind our heuristics. 

Query Generation : Let the vector denote the predic¬ 
tion confidences generated by the classifier for an unlabeled 
pixel Ui, where corresponds to the confidence towards 
class c. If one wishes to compute the over-segmentation 
from the classifier probability for membrane class it is 
favorable to have o ^ m fox all membrane pixels. 

Eor a pixel Ui from the other classes, the deviance p^ — p^ 
should be maximized instead. We define a margin vector 
Pi wrt class m consisting of these quantities defined as fol¬ 
lows. 

a = argmaxp^ 

c 

{ 0, c ^ a 

pY", c=a = m (7) 

pt-pT^ c=a^m 

Let gi be the margin wrt class m computed for Ui in a 
similar fashion from the real-valued outputs of multiclass 
label propagation algorithm. The disagreement 6{ui) be¬ 
tween these two estimates is computed by the dot product 
of their differences. 

= (gi - Pi)^(gi - Pi)- (8) 

The margins gi and pi are modeled to capture the over¬ 
lap in confidence the two predictors have between mem¬ 
brane and other classes. The disagreement ^pixei(^i) be¬ 
tween these two margins will increase when the confidence 
distributions deviate from one another. A few unlabeled 
samples with largest disagreement value (^pixei(^i) will be 
selected as the next set of queries to be presented to the user. 
After the termination of the training process, the real-valued 
confidences of the classifier (RE in our case) are used for the 
subsequent tasks. 




2.3. Active learning for Superpixel Boundary Clas¬ 
sification 

The output confidence of pixel classifier (RF in our 
cases) is utilized to generate an over-segmentation of the 
image or volume (see Figure 2). In order to aggregate the 
fragments into actual cell regions, each boundary between 
two superpixels of this over-segmentation needs to be clas¬ 
sified as true or false boundary. We employ a superpixel 
boundary classifier (RF) that is also trained using the active 
learning method. For this training, each datapoint Vi corre¬ 
sponds to a superpixel boundary. 

Initial Subset Selection : In order to reduce redundancy, 
the initial labeled set Xi was populated by the centers of the 
output of a clustering algorithm such as k-means. 

Query Generation : Given the real valued confidences qi 
from the current classifier and the estimates hi of the label 
propagation method, we use the following formula to com¬ 
pute disagreement between them. 

(^sp(^z) = [qi - hif. (9) 

Note that, since there are only two classes, values of both qi 
and hi are scalar for superpixel border classification. A few 
samples with largest ^sp(^i) are selected as the next query 
set to be annotated. After the training terminates, the real 
valued predictions from RF are used for superpixel cluster¬ 
ing. 

3. Experiments and Results 

The proposed algorithm has been tested for both 3D vol¬ 
ume and 2D image segmentation problems. In the follow¬ 
ing, we will describe the experimental setup, i.e., compu¬ 
tation of the intermediate quantities, feature representation 
etc. for pixel and superpixel boundary classification. The 
Sections 3.2 and 3.3 report the results on 3D and 2D data 
respectively. 

3.1. Experimental Setup 

Pixel classification : As noted earlier, each pixel was clas¬ 
sified into four classes: membrane, cytoplasm, mitochon¬ 
dria, and mitochondria border. A pixel is represented by 
features similar to those utilized in ilastik [31], e.g., gaus- 
sian smoothing, gradient magnitude, laplacian of gaussian, 
hessian of gaussian and its eigenvalues, structure tensor and 
its eigenvalues etc. computed at different scales. The simi¬ 
larity values for a pair of examples {ui^ Uj} were generated 
by gaussian distance between their feature representations: 
Wij = where^j are the 

feature values of Ui and H is the covariance matrix among 
all feature vectors. 

Superpixel boundary classification : Given the pixel de¬ 
tection result, we utilize the predicted confidence values of 
the membrane class for generating an over-segmentation by 


the watershed algorithm [4] . In order to generate the water¬ 
shed, we used all the pixels (or clusters or pixels larger than 
size 3) with RF confidence for membrane class < 0.01. 
For superpixel clustering, we follow a context-aware ag¬ 
glomeration approach of [29] that was designed to prevent 
under-segmentation by delaying some merge decisions dur¬ 
ing agglomeration. This agglomeration scheme first clus¬ 
ters the cytoplasm superpixels together using a superpixel 
boundary predictor and then absorbs the mitochondria bod¬ 
ies into the agglomerated cytoplasm regions based on their 
degree of inclusion. A superpixel boundary predictor for 
this setup considers the cell boundary as well as the bor¬ 
der between mitochondria and cytoplasm as true boundaries 
and only the borders between over-segmented cytoplasm 
superpixels as false boundaries. 

Each boundary is represented by the statistical proper¬ 
ties of the multiclass probabilities estimated by the pixel 
detector. The statistical properties include mean, standard 
deviation, 4 quartiles of the predictions generated for the 
data locations on the boundary, two regions it separates as 
well as the differences of these region statistics [29]. All of 
these features can be updated in constant time after a merge 
- a property which improves the efficiency of the segmen¬ 
tation algorithm substantially. The affinity values between 
two suprepixel boundaries were computed by the same for¬ 
mula used for pixel classification. 

3.2. Result on 3D segmentation 

We have tested our algorithm for 3D volume segmen¬ 
tation on Focused Ion Beam Serial Electron Microscopy 
(FIBSEM) isotropic images collected from fruit fiy retina 
with a resolution of 10 x 10 x lOnm. One 250^ volume and 
two 520^ volumes were used as training and test datasets 
respectively. The proposed algorithm does not need an 
exhaustive pixel-level groundtruth. However, for this par¬ 
ticular experiment, instead of presenting queries to an an¬ 
notator, we read off their labels from a noisy pixel level 
groundtruth generated earlier for another study [29] . Each 
of the segmentation tasks, namely pixel classification, over¬ 
segmentation and subsequent context-aware agglomeration 
were performed in 3D. 

The performance of our algorithm was compared 
against a combination of [9] and [28] that has been 
one of the top scorer of the SNEMI 3D segmenta¬ 
tion challenge 2013 (http : / /brainiac2 . mit. edu/ 
SNEMI 3D). The neural net for pixel prediction was trained 
with the same techniques described in [9] [12]. In order to 
further improve the quality of the probability maps, the out¬ 
puts on rotated images were averaged together [10]. The 
watersheds were generated in the same manner as those of 
the proposed method and then the agglomeration technique 
of [28] was applied for superpixel clustering. 

We report the under- and over-segmentation errors sepa- 






(a) Split-VI test voll (b) split-RE test voll (c) Split-VI test vol2 (d) Split-RE test vol2 

Figure 4. Quantitative evaluation of competing methods on two EIBSEM test volumes. Left and right pairs of plots show the split-VI and split-RE errors of two methods on 
volume 1 and 2 respectively. 


rately because under-segmentation is costlier than the other 
in terms of manual correction. Given a groundtruth, GT, 
and a segmentation, SG, split versions of variance of infor¬ 
mation (VI) [26] and Rand Error (RE) [19] were selected 
for performance evaluation. Eor split-VI, the over and 
under-segmentation are quantified by the conditional en¬ 
tropy i^(GT I SG) and H{SG \ GT) respectively. The over¬ 
segmentation and under-segmentation quantities in Rand 
Error are the ratios of pixel pairs within same cluster in GT 
but different cluster in SG and vice versa. 

The proposed algorithm has been trained and applied 6 
times to assess its consistency. In each training pass, we ran¬ 
domly subsampled a set of pixels from the whole training 
set so that the weight matrix W used in label propagation 
contains ^ 0.5% nonzero values and still fit in the avail¬ 
able memory. The remaining parameters of the proposed 
active learning scheme are fixed to initial set size = 4000 
(1000 each class), query set size = 10, number of queries 
= 800 for all the experiments reported in this paper. Eor 
the superpixel boundary learning, the parameters are set for 
all experiments to initial set size = 3.5% of total number of 
boundaries, query set size =10, number of total boundaries 
labeled = 15% of all examples (10000 ^ 140000 in total). 
With our current implementation, the computation of pixel 
and superpixel training scheme needed around 24 hours on 
a 32 and 16 core cluster node respectively. 

In Eigure 4, we plot the split versions of error measures: 
X and y axes correspond to under- and over-segmentation er¬ 
rors respectively. Ideally, a segmentation algorithm should 
attain an error rate of 0, and therefore be plotted at the ori¬ 
gin of the graph. Eor both the proposed and that of [9] [28], 
the points on the plot were calculated by varying the stop¬ 
ping point of the agglomeration algorithm. The curve cor¬ 
responding to the proposed method is an average of per¬ 
formances on 6 trials. On the two EIBSEM test volumes, 
the proposed algorithm (blue -o-, cyan and green curves 
are explained later) consistently produced lower false merge 
errors than that (red -x-) of [9] [28] at the same over¬ 
segmentation error level. 

The combined methods of [9] [28] generally attained 
high quality segmentation in most areas of the test volumes. 
However, because they do not emphasize on the membrane 
class for training, their outputs were vulnerable to false 



(a) Plane 50 (b) Plane 220 (c) Plane 484 

Figure 5. (a)-(c):Result of the proposed method on the same three slices as dis¬ 
played in Eigure 1. The output contains no false merges of significant size. 


merges near relatively weaker membranes. In Figure 1, 
we have displayed the false merge generated by [9] [28] 
operating at agglomeration threshold 0.15 (highest point on 
the red curve of Figure 4(c)) on test volume 2. Segmenta¬ 
tion produced by the proposed method did not reproduce 
this or any other false merges of similar size; the output 
of our method is shown in Figure 5(a)-(c) for the same 
three planes. The qualitative results from the proposed 
method was generated with an agglomeration threshold 
of 0.3 (halfway in the blue curves of Figure 4(c)). In 
both these images, the segmented regions are overlaid 
on the raw data with random color. Adjacent regions 
with same color may not always imply they are merged. 
The qualitative outputs on the two test volumes and a 
python script to visualize them can be found at https : 
//www.dropbox.com/sh/35x0z6md064yo88/ 
AAAbH6JUwAwDKITDNnSsVEKga?dl = 0. A video 

of the output is also uploaded to youtube at https : 
//www.youtube.com/watch?v=osJtSJ8CS04. 

In fact, both the test volumes were under-segmented in 
the watershed computed from [9]. The VI errors for under¬ 
segmentation for a watershed on [9] output were 0.132 
and 0.236 respectively for two test volumes as opposed to 
0.0188 and 0.0243 on average for those computed from our 
method. Such outcome may not be obvious from an ex¬ 
amination of pixel probabilities computed by the proposed 
method and [9] ; example predictions on Plane 484 are dis¬ 
played in Figure 6(a)-(c). Indeed, the overall accuracy of 
our pixel detector is less than 90% on samples whose labels 
are unknown to the active algorithm. Although the devia¬ 
tion measure defined for active learning of pixel detection 
in Section 2.2 enables the identification of misclassified lo¬ 
cations, the gain in classification accuracy is not the promi- 






















(a) Membr.from [9] (b) Prop, membr. (c) Prop, mitochond. (d) % pixel <0.01 (e) SP accuracy (f) Error in SP query set 

Figure 6. (a)-(c): Pixel predictor confidences on Plane 484. (a) by [9], (b) and (c) by proposed method for boundary and mitochondria class respectively, (d): percentage of 
pixels with < 0.01 plotted against number of iterations, (e): increase in superpixel (SP) boundary classification accuracy with number of iterations, (f): prediction errors of 
the classifier and label propagation on every 10 query sets (100 samples) during superpixel boundary classification. 


nent factor contributing to the low under-segmentation error 
of our technique. 

The proposed pixel detection algorithm inherently mini¬ 
mizes the number of boundary pixels (and maximizes num¬ 
ber of other types of pixels) receiving a confidence < 
0.01. Such an outcome is conducive to minimizing false 
merges in the consequent watershed method. In Figure 6(d) 
we plot the percentage of pixels of membrane (blue o) and 
other classes (red x) with < 0.01 against the number 
of iterations. By construction, the algorithm starts with a 
very low, approximately 0.01%, of membrane pixels with 
< 0.01. With the progression of the iterative updating 
of training examples, the proposed approach increases the 
percentage of other pixels with p^ <0.01 while maintain¬ 
ing that for membrane pixels at the initial value. 

In case of the superpixel boundary classifier, however, 
the training scheme effectively reduces the classification er¬ 
ror in distinguishing false boundaries from the correct ones. 
In Figure 6(e), we plot the increase in accuracy of the clas¬ 
sifier being actively trained (blue curve) and that of the one 
learned from all examples (black dashed line) on test sam¬ 
ples. The plot shows a steady performance improvement 
with query iterations (x-axis). Interestingly enough, the 
error rates of both the predictors, namely the label propa¬ 
gation and the classifier, on query sets of images drops to 
zero after a certain number of iterations as shown in Fig¬ 
ure 6(f). Such behavior has been observed in all the trials of 
superpixel boundary training and was utilized to determine 
a stopping criterion for training. 

We have not reached to a point of zero error rates in 
query set for pixel classification. In order to test the sen¬ 
sitivity of the stopping criterion, we have plotted the error 
curves with 700, 800 and 1000 queries on Figure 4 in cyan, 
blue and green colors respectively. The almost overlapping 
error curves suggest that the training converges in practice 
around 800 queries. After termination, the distribution of 
pixels in the whole dataset and those selected by the active 
semi-supervised algorithm are provided in Table 2. Our al¬ 
gorithm selected more samples from the membrane class 
than one would choose by randomly sampling the same 
number of examples. 

To further test the parameter sensitivity and robustness 
of our algorithm, we applied the proposed training with 


Table 2. Percentage of pixels in different classes in the whole dataset and the train¬ 
ing set selected by the proposed pixel detection algorithm. 



cytoplasm 

membrane 

mitochon. 

mito border 

Whole dataset 

72.43 

12.94 

11.01 

3.62 

Active selected 

52.09 

30.59 

14.78 

2.54 



(a) Plane 22 (b) Plane 76 (c) Plane 300 

Figure 7. Qualitative Result of the proposed method on mushroom body FIBSEM 
data with exact same parameter. White boxes mark gaps in the membrane where the 
proposed method successfully avoided false merge. 


the exact same parameter on a 250^ FIBSEM volume 
from a different region (mushroom body) of fly brain 
and produced almost perfect segmentation on a separate 
512^ mushroom body volume. Figure 7 shows outputs 
on some of the planes, note how the bias towards the 
membrane class of the proposed method resisted false 
merges on membrane gaps marked by white squares. 
Segmentation of all 512 images can be found at https : 
//www.dropbox.com/sh/35x0z6md064yo88/ 
AAAbH6JUwAwDKITDNnSsVEKga?dl=0. The 

output is also uploaded to youtube at: https: 
//WWW.youtube.com/watch?v=mKnxxbQtN0g. 
Performance of DAWMR [16] : Our effort to test the ca¬ 
pability of DAWMR [16], which is an extended version 
of [17], has not yet yielded results comparable to [9] [28]. 
We attempt to analyze the reason behind such performance 
in the following text. 

The authors of [16] have kindly generated the affinity 
maps computed by the deep network for both our FIBSEM 
volumes. We computed the probability maps according to 
the authors’ suggestion and applied [28] for superpixel clus¬ 
tering. At the same over-segmentation level as the proposed 




























method (y-axis on Figure 4), the result of DAWMR+ [28] 
contained large incorrectly merged bodies in comparison to 
the proposed method and the combination of [9] [28]. 

While investigating the reason behind this performance, 
we found that the pixel predictions of DAWMR for cell 
membrane fade away in several consecutive planes at mul¬ 
tiple locations on the test volume. We show 3 such planes 
(339 341) in Figure 8. These areas, some of which are 
marked in red on Figure 8, are most probably responsible 
for joining two neurites inaccurately during the agglomer¬ 
ation. Recall that, the region statistics of pixelwise con¬ 
fidences are typically used as features for the superpixel 
boundary classifier [3] [28] [30]. 



(a) Plane 339 (b) Plane 341 

Figure 8. Top row: two planes on test volume. Bottom row: the membrane predic¬ 
tion of DAWMR [16] with weak confidences marked in red. 


3.3. Result on 2D segmentation - ISBI12 

We have also tested the proposed method for 2D 
segmentation on datasets provided for ISBI 2012 seg¬ 
mentation challenge (http : //brainiac2 .mit. edu/ 
isbi_challenge/home). The challenge website pro¬ 
vides a training set of 30 annotated images, generated by 
serial section Transmission Electron Microscopy (ssTEM) 
from the ventral nerve cord (VNC) of the Drosophila larva. 
We remind the user that an exhaustive groundtruth is not 
required by the proposed strategy because it automatically 
identifies the pixels and superpixel boundaries that are 
needed to be labeled by an annotator. For convenience of 
experimentation, and to incorporate some mistakes a human 
annotator would make in the active learning setting, we gen¬ 


Table 3 . Comparison of F-measure of Rand error provided by ISBI 2012 website. 



Proposed 

[9] 

A11+ [28] 

error 

0.08 

0.05 

0.126 


erated a noisy groundtruth by performing a watershed with 
all cell interior pixels marked as seeds and read off labels 
from this groundtruth. 

A similar set of 30 images, without the groundtruth, was 
also provided for test purposes. The proposed method was 
applied on this dataset with the same number of samples 
and iteration for pixel classification as mentioned in Sec¬ 
tion 3.2. The number of examples utilized for superpixel 
boundaries is also similar to those stated in Section 3.2. In 
Table 3, we show the quantitative measures of performances 
of our method, that of [9] and another baseline algorithm 
that uses all pixels for training the pixel detector (Random 
Forest) and the technique of [28] for superpixel boundary 
training. Since the groundtruth for the test dataset is not 
available, the split versions of VI and RE could not be com¬ 
puted. A qualitative inspection of the results (at https : 
//WWW.dropbox.com/sh/ 35x0z 6md0 64yo8 8/ 
AAAbH6 JUwAwDKITDNnSsVEKga?dl = 0) suggests that 
the difference in error values between our method and those 
of [9] was most probably caused by over-segmentation. 

For complete neuron reconstruction, the 2D segmenta¬ 
tion results on anisotropic images - such as those of ISBI 
12 dataset - need to be connected across planes by a link¬ 
age algorithm. The linkage algorithms have been shown to 
refine some false split errors, but cannot recover from false 
merges [11] [33]. It is therefore rational (and may even be 
necessary) to prevent under-segmentation at a cost of small 
over-segmentation rate. This strategy will be more effective 
on difficult areas of EM volume characterized by broken or 
hazy membranes or dark cell regions. We downloaded 20 
images from two different regions of the whole larva dataset 
(http://fly.mpi-cbg.de/) and computed segmen¬ 
tation with predictors trained on the challenge data and the 
same set of parameters. For [9], the output was generated 
by applying watershed after thresholding the pixel predic¬ 
tion values at 0.3, same as that used to compute the winning 
entry of the ISBI 12 challenge. 

As Figure 9 demonstrates, the proposed method prevents 
most of the false merges generated by [9] in these chal¬ 
lenging areas and facilitates more accurate reconstruction 
through linkage algorithms like [11, 33]. An emphasis 
on learning the membrane class leads to a wall generally 
‘higher’ than those from [9] around watershed basins. 
Results on all the 20 images can be found at https : 
//www.dropbox.com/sh/35x0z6md064yo88/ 
AAAbH6JUwAwDKITDNnSsVEKga?dl=0. 

Figure 10 shows images with some pixel locations (cir¬ 
cle centers) selected as queries by our active pixel training 
method. Recall that the query set consists of the challeng- 


















(a) Input (b) [9] (c) Proposed 

Figure 9. Performance comparison with [9] on challenging regions of larva data. 
Regions highlighted in white (middle column) are falsely merged by watershed gen¬ 
erated from [9]. 


Figure 10. Sample queries determined automatically by the proposed method. 
Note how the queries were placed at challenging locations on images such as patch 
between mitochondria and cell boundary, areas with darker shades. 

ing examples - the locations where the estimation of the 
two techniques contradict each other. The regions covered 
by queries include patch between mitochondria and cell 
boundary, areas with darker shades. These regions often 
turn out to be misclassified (or receive low confidence) by a 
predictor trained in interactive setting of [31]. 

In Figure 11 , we show the output confidences from the 
label propagation algorithm and classifier on the first few 
samples selected as queries for three classes: cytoplasm 
(blue), membrane (green) and mitochondria (brown). The 
top and bottom panels correspond to the label propagation 
and RF respectively. The # sign on top the bar shows the 
correct label for that particular sample. The plot shows how 
some samples misclassified by the RF classifier were cor¬ 
rectly predicted by label propagation method and vice versa. 
Interestingly, the first sample was not detected accurately by 
any of the techniques. 

4. Discussion 

We have proposed a framework for training the neces¬ 
sary tools for an EM segmentation algorithm by acquiring 
some properties suitable for neural reconstruction. On one 
hand, the proposed method suggests a strategy to train with¬ 
out complete groundtruth by automatically selecting a small 
fraction of training examples. On the other hand, our algo¬ 
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Figure 11 . The output confidences from the label propagation and the classifier 
on the first 12 examples returned as queries for 3 major classes: cytoplasm (blue), 
boundary (green) and mitochondria (brown). The top panel corresponds to label 
propagation predictions and the bottom shows that from the classifier (in the opposite 
direction, direction does not imply sign). The mark # denotes the correct label of any 
particular sample. 


rithm is designed to minimize the false merge errors which 
are substantially more difficult to correct than the false split 
errors. The results demonstrate the merit of our method 
for neural reconstruction in comparison to the existing al¬ 
gorithms. 

EM segmentation is a critical element of neural recon¬ 
struction process that led to high impact research in natural 
sciences, in particular neurobiology/neuroscience [32] [15]. 
Our approach is designed to expedite multiple components 
of the overall reconstruction effort. Eor example, the neu¬ 
robiologist who prepares the tissue sample currently relies 
only on visual inspection for sample quality assessment. A 
faster training method could assist the imaging expert to de¬ 
termine the optimal sample quality based on actual results 
rather than the raw images. 

The authors of [32] made an observation vital to the man¬ 
ual error correction step: screening 100% of the segmenta¬ 
tion result is impractical due to data size and is often redun¬ 
dant for extracting the underlying connectome. An intelli¬ 
gent strategy to automatically spot the areas needing cor¬ 
rection, as proposed in [20], is perhaps essential for com¬ 
puting connectome from EM images. The presence of no or 
minimal under-segmentation is a prerequisite for applying 
methods such as [20]. 

With the increase of the size of brain region that the re¬ 
searchers ponder on reconstructing, it is anticipated that ap¬ 
pearances (and therefore the feature distributions) of dif¬ 
ferent regions of brain would vary considerably from one 
another. An efficient approach for preparing the auto¬ 
mated algorithms may be inevitable for scenarios where one 
must train different predictors for different regions of large 
datasets. 

Einally, although the algorithm is modeled and tested pri¬ 
marily for EM reconstruction, it has a potential to be applied 
in other domains. Techniques that use superpixels aggre¬ 
gation to produce the final segmentation, e.g., [1] for cell 
tracking in light microscopy, [34] for blood cell segmenta¬ 
tion, can utilize our method for efficiency and performance 
improvement. 
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